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Abstract 

We consider the problem of estimating for a given representor h the value th{0) 
of a linear functional of the slope parameter (3 in functional linear regression, where 
scalar responses Y%, . . . , Y n are modeled in dependence of random functions X\, . . . , X n . 
The proposed estimators of lh{(3) are based on dimension reduction and additional 
thresholding. The minimax optimal rate of convergence of the estimator is derived 
assuming that the slope parameter and the representer belong to some ellipsoid which 
are in a certain sense linked to the covariance operator associated to the regressor. 
We illustrate these results by considering Sobolev ellipsoids and finitely or infinitely 
smoothing covariance operator. 
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1 Introduction 

Functional data analysis (Ramsay and Silverman [2005] and Ferraty and Vieu [2006]) has 
become very important in a diverse range of disciplines including chemometrics (Frank and 
Friedman [1993]), econometrics (Forni and Reichlin [1998] and Preda and Saporta [2005]), 
biometry or climatology (Besse et al. [2000]). Roughly speaking, in all these applications 
the dependence of a scalar response variable, say Y £ K, on the variation of an explanatory 
random function X is modeled by 

Y=f 0(t)X(t)dt + ae, a > 0, (1.1) 
J o 

for some slope function j3 and error term e. In recent years the nonparametric estimation 
of the slope function j3 given a sample of (Y, X) has been of growing interest in the lit- 
erature. For example, Bosq [2000], Cardot et al. [2007] or Miiller and Stadtmiiller [2005] 
consider a functional principal components regression, while a penalized least squares ap- 
proach combined with projection onto some basis (such as splines) is studied in Ramsay 
and Dalzell [1991], Eilers and Marx [1996], Cardot et al. [2003], Hall and Horowitz [2007] 
or Crambes et al. [2009]. However, the nonparametric estimation of [3 leads in general to 
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an ill-posed inverse problem and hence all the proposed estimators have under reasonable 
assumptions very poor rates of convergence. In other words, even relatively large sam- 
ple sizes may not be of much help in accurately estimating (3. In contrast, it might be 
possible to estimate certain local features of j3, such as the value of a linear functional 
^h(P) := Jo (3(t)h(t)dt with respect to some given representer h, at the usual parametric 
rate of convergence. For example, rather than estimating the slope parameter (5 itself one 
may be interested in its average value y a (3{t)dt over a certain interval [a, b]. Then it is 
of interest to characterize the attainable accuracy of any estimator, for example, in terms 
of the mean squared error (MSE), which obviously depends on the representer h and the 
conditions imposed on (5. It is worth noting, that the nonparametric estimation of the 
value of a linear functional from Gaussian white noise observations is a subject of con- 
siderable literature (c.f. Speckman [1979], Li [1982] or Ibragimov and Has'minskii [1984] 
in case of direct observations, while in case of indirect observations we refer to Donoho 
and Low [1992], Donoho [1994] or Goldenshluger and Pereverzev [2000] and references 
therein). However, as far as we know this question has not yet been addressed in func- 
tional linear regression, which in general is not a Gaussian white noise model. The objec- 
tive of this paper is the nonparametric estimation of the value £h{(3) of a linear functional 
based on an independent and identically distributed (i.i.d.) sample of (Y, X) obeying (1.1). 

In this paper we suppose that the random function X is taking its values in L 2 [0, 1], 
which is endowed with the usual norm ||-|| and inner product (•, •), and that X has a finite 
second moment, i.e., E||X|| 2 < oo. In order to simplify notations we assume that the mean 
function of X is zero. Moreover, the random function X and the error term e are uncor- 
rected, where e has mean zero and variance one. This situation has been considered, for 
example, in Bosq [2000], Cardot et al. [2003] or Cardot et al. [2007]. Then multiplying both 
sides in (1.1) by X(s) and taking the expectation leads to the continuous equivalent of the 
normal equation in a classical multivariate linear model. That is, 

g(s) :=E[YX(s)} = f (3(t) Cov(X(t), X(s))dt =: [T cov (3](s), s G [0,1], (1.2) 
Jo 

where g belongs to L 2 [0, 1] and T cov denotes the covariance operator associated to the 
random function X. Estimation of (3 is thus linked with the inversion of the covariance 
operator T cov of X and, hence called an inverse problem. Moreover, due to the finite second 
moment of the regressor X the associated covariance operator T cov is a non negative nuclear 
operator (c.f. Dauxois et al. [1982]). Consequently, unlike in a multivariate linear model, 
a continuous generalized inverse of T cov does not exist as long as the range of the operator 
T cov is an infinite dimensional subspace of L 2 [0, 1]. This corresponds to the setup of ill- 
posed inverse problems (with the additional difficulty that T cov is unknown and, hence has 
to be estimated). In what follows we always assume that there exists a unique solution 
(3 G L 2 [0, 1] of equation (1.2), i.e., g belongs to the range lZ(T cov ) of T cov , and that the 
null space AA(T cov ) of T cov is trivial or equivalently T cov is strictly positive (for a detailed 
discussion in the context of inverse problems see Chapter 2.1 in Engl et al. [2000], while in 
the special case of a functional linear model we refer to Cardot et al. [2003]). Furthermore, 
we suppose that the representer h of the linear functional 1^ of interest is an element of 
L 2 [0, 1] as well. Then it is straightforward to see, that the value of the linear functional 
£h{P) is identified if and only if h belongs to the orthogonal complement .A^Tcov)^ of the 
null space N(T cov ). Hence, for all h G L 2 [0, 1] the identification is in particular guaranteed 
under the assumption of a strictly positive covariance operator T cov . 
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In this paper we follow an often in the literature used approach to construct an esti- 
mator of the value of a linear functional. That is, we replace in £h{(3) the unknown slope 
function (3 by an estimator. In the particular case of second order stationary regressors 
(defined below) the considered estimator of (3 is just an orthogonal series estimator with an 
additional thresholding in the Fourier domain (Johannes [2009b] ) . Note that over relatively 
short periods of time, the assumption of second order stationarity is in many situations 
realistic and moreover it can be checked from the data by estimating the covariance func- 
tion using the multiple realizations of X. It is remarkable, that in this situation under 
mild moment assumptions the obtained plug- in estimator of £h({3) attains minimax-optimal 
rates of convergence in terms of the MSE over a wide range of ellipsoids (defined below) 
characterizing the prior information about slope parameter and representer respectively, 
and which are linked (defined below) to the covariance operator T cov . In particular, we 
illustrate these results by considering Sobolev ellipsoids and finitely or infinitely smoothing 
covariance operator. However, in a second step we drop this assumption and we do no 
longer suppose that the regressor is second order stationary. In this general setting the 
estimator of (3 is based on a dimension reduction together with an additional thresholding 
(Car dot and Johannes [2008]). Then we show under stronger moment assumptions that 
the plug-in estimator of £h({3) still attains minimax-optimal rates of convergence in terms 
of the MSE but only over a more restrictive range of ellipsoids for (3 and h respectively. 

The paper is organized in the following way. In Section 2 we introduce our basic assump- 
tions and derive a lower bound for estimating the value of a linear functional based on an 
i.i.d. sample obeying the functional linear model (1.1). In Section 3 under the assumption 
of second order stationarity we show first consistency of the proposed estimator and second 
its minimax-optimality. The general case without the second order stationarity assumption 
is then considered in Section 4. All proofs can be found in the Appendix. 

2 Complexity of local estimation: a lower bound. 

2.1 Notations and assumptions. 

In this section we show that the obtainable accuracy of any estimator of the value £h{(3) of a 
linear functional can be essentially determined by additional regularity conditions imposed 
on the slope parameter /3, the representer h and the covariance operator T cov . In this paper 
these conditions are characterized through different weighted norms in L 2 [0, 1] with respect 
to a pre-specified orthonormal basis £ N} in L 2 [0, 1], which we formalize now. We 

shall stress that this basis corresponds not necessarily to the eigenfunctions of T cov . Then 
given a strictly positive sequence of weights w := (wj)j^i and a constant c > denote for 
all r € R by T^r the ellipsoid given by 



Furthermore let T w r ■= {/ e L 2 [0, 1] : ||/||^r < oo}. It is worth to note, that in case w = 1 
the set J-£j denotes an ellipsoid in L 2 [0, 1] and hence does not impose additional restrictions. 

Minimal regularity conditions. Let 7 := (jj)j^i and u := (wj)j^i denote two se- 
quences of weights. Then we suppose, here and subsequently, that the slope function [3 
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belongs to the ellipsoid for some p > and that the representer h of the linear func- 
tional lh is an element of the ellipsoid J-^ for some r > 0. The ellipsoids J-^ and J-^ 
capture all the prior information (such as smoothness) about the unknown slope function (5 
and the given representer h respectively. Furthermore, as usual in the context of ill-posed 
inverse problems, we link the mapping properties of the covariance operator T cov and the 
regularity conditions on (5 and h. Therefore, consider the sequence ((T CO vtpj,ipj))j^i, which 
is summable and hence converges to zero since T cov is nuclear. In what follows we impose 
restrictions on the decay of this sequence. Denote by j\f the set of all strictly positive nu- 
clear operator defined on L 2 [0, 1]. Then given a sequence of weights v := {vj)j^\ and d ^ 1 
define the subset N% of N by 

M*:={T€M: \\f\\lz/d 2 K\\TffKd 2 \\ff v2 , V/gL 2 [0,1]}. (2.2) 

Notice that for all T G jVjf it follows that 1 (Tifjj,ifjj) Vj. Hence, the sequence (vj)j^i has 
to be strictly positive and summable since T is strictly positive and nuclear. Moreover, if 
{\j,ipj,j ^ 1} is a spectral decomposition of T G N . Then the condition T G A/jf is satisfied 
if and only if Xj Vj. In what follows the results are derived under regularity conditions 
on the slope parameter (3, the representer h and the covariance operator T cov described 
through the sequence 7, u and v respectively. However, we provide below illustrations of 
these conditions by assuming a "regular decay" of these sequences. The next assumption 
summarizes our minimal regularity conditions on these sequences. 

Assumption 2.1. Let 7 := ('Jj)j^i, uj := (ujj)j^i and v := (vj)j^i be strictly positive 
sequences of weights with 7x = 1, u>\ = 1 and V\ = 1 such that 7 and uj are nondecreasing 
and v is nonincreasing with A := ^ - Vj < 00. Furthermore, there exists a constant D ^ 1 
such that sup 1 ^ m {l / (vj ujj)} ^ D max(l/(iL' rn t;^ l ), 1) for all m G N and k = 1,2. 

We shall stress that is just an ellipsoid in L 2 [0, 1] in case 7 = 1, hence in this 
situation there is not an additional regularity condition on the slope parameter (3 imposed. 
Furthermore, the last condition in Assumption 2.1 is obviously satisfied with D = 1 if the 
sequence (vjUjj) is either monotonically decreasing or increasing. 

Matrix and operator notations. Given m ^ 1, ^ m denotes the subspace of L 2 [0, 1] 
spanned by the functions {ipi, • • • , ipm\- n m and denote the orthogonal projections on 
\£ m and its orthogonal complement Vl/^, respectively. Given an operator (matrix) K, \\K\\ 
denotes its operator norm. The inverse operator (matrix) of K is denoted by AT -1 . The 
identity operator (matrix) is denoted by / and the diagonal matrix with vector of entries v 
is denoted by Diag(f). [/] and [K] denote the (infinite) vector and matrix of the function 
/ and the operator K with entries [f]j = {f,ipj) and [K]jj = (Kipi,ipj) respectively. The 
upper m subvector and m x m submatrix of [/] and [K] is denoted by [f]m and [AT]™., 
respectively. Clearly, [n m /]m = [f]m arid if we restrict n m ATI m to an operator from ^ m 
into itself, then it has the matrix [AT] m . 

Consider the covariance operator T cov given in (1.2). We assume throughout the paper 
that T cov is strictly positive definite and hence the matrix [T C ov]m is nonsingular for all 
m G N, so that [Tcov]" 1 always exists. Under this assumption the notation (Tcov)" 1 is used 
for the operator from L 2 [0, 1] into iself, whose matrix in the basis {ijjj} has the entries 
(PcovJm 1 ).?,? f° r 1 ^ i> ' ^ 171 an d zeros otherwise. 

1 We write a x d b if d _1 ^ b/a ^ d. 
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Moment assumptions. The results derived below involve additional conditions on the 
moments of the random function X, which we formalize now. Since [T cov ] m is nonsingular 

1 /2 —1/2 

it follows that the random vector [T cov ]m [X]m has uncorrelated entries ([T cov ]m 
j = 1, . .. ,m, with mean zero and variance one. Thereby, for all m G N and z G S m := 
{z G K m : z t z = 1} the centered random variable X^j=i z j ([T C ov]m^ 2 [X]m) j has variance one 
too. Furthermore, [r cov ]jj is the variance of the centered random variable [X]j, j G N, and 
tr(T cov ) = ^j e f$[T cov ]jj = E||X|| 2 < oo. Let X be the set of all centered random functions 
X with finite second moment, i.e., the associated covariance operator T cov satisfies tr(T cov ) < 
oo. Here and subsequently, we denote by Xfj, k G N, rj ^ 1, the subset of X containing 
only random functions X such that the &-th moment of the corresponding random variables 
[X]j/[T cov ]yj , j G N, and YlT=i z j([ T cov]^^ 2 [X]m)j are uniformly bounded in z G S m and 
m G N, that is 



X* := ix G X such that supE [X]j /[T^/jf < rj 

m , 

and sup sup E Vzj([T cov ]„ 1/2 [I]„)j <r/[. (2.3) 



It is worth noting that in case X G X is a Gaussian random function the corresponding 
random variables [X]j /[T cov ]yj , j G N, and Y^j=i ^([ r cov]m 1/2 [^]m)j, z G § m , m G N, are 
Gaussian with mean zero and variance one. Hence, for each k G N there exists i] such that 
any Gaussian random function X £ X belongs also to X~. In what follows, £ k stands for the 
set of all centered error terms e with variance one and finite fc-th moment, i.e., ¥,\e\ k ^ r\. 

2.2 The lower bound. 

In the proof of the next theorem we show that a one-dimensional subproblem captures 
the full difficulty in estimating a linear functional £h(P) of the slope parameter (3. In other 
words, there exist two sequences of slope functions 0i,n, (h,n S T^, which are statistically not 
consistently distinguishable, and a sequence of representer h n G such that \£h n {Pi,n) — 
£h„{^2,n)\ 2 ^ C5^, where 5* is the optimal rate of convergence. Moreover, we obtain the 
following lower bound under the additional assumption that the error term e is standard 
normal distributed, i.e., e ~ A/"(0, 1), and independent of the regressor X. 

Theorem 2.1. Assume an n-sample of(Y,X) obeying (1.1) with a > 0. Suppose that the 
error term e ~ jV(0, 1) and the regressor X G X k , rj ^ 1, k G N, with associated covariance 
operator T cov G J\f£, d ^ 1, are independent. Let := m*(n) G N and <5* := <5*(m*) G M + 
be such that for some A ^ 1 hold 

1/A^-^^A and 6*:=^-]. (2.4) 

// i/ie sequences 7, cj and u satisfy in addition the Assumption 2.1, then for any estimator 
I we have 



SUp SUp \e\£- 4(/3)| 2 ) ^ 77- mm f^7>xl max (C n ~ 
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Remark 2.1. The normality and independence assumption of the error term in the last 
theorem is only used to simplify the calculation of the distance between distributions cor- 
responding to different slope functions. Below we derive an upper bound assuming that 
the error term e G £„ and the regressor X are uncorrelated. Obviously in this situation 
Theorem 2.1 provides a lower bound for any estimator as long as the moment restrictions 
do not exclude a Gaussian error. Furthermore, the lower bound tends only to zero if 
(wj7j)j^i is a divergent sequence. In other words, in case 7 = 1, i.e., without any addi- 
tional restriction on /?, consistency of an estimator of £h(/3) uniform for all j3 G L 2 [0, 1] is 
only possible under restrictions on the representer, that is, a; is a divergent sequence. This 
obviously reflects the ill-posedness of the underlying inverse problem. Finally, it is worth to 
note that independent of the regularity condition on the slope parameter, i.e., even 7 = 1 is 
possible, the lower bound tends to zero with parametric rate 1/n if and only if the sequence 
(ujVj)j-^x is bounded away from zero. □ 



3 The case of second order stationary regressors. 

We assume in this section that the regressor X is second order stationary, i.e., there exists 
a positive definite function c : [—1, 1] — > M such that Cov(X(t), X(s)) = c(t — s), s,t £ [0, 1]. 
Then it is shown in Johannes [2009b] that the eigenfunctions of the covariance operator 
T cov associated to X are given by the trigonometric basis 

V>i := 1, ip 2 j(s) := \/2cos(27rjs), ifaj+i(s) := sin(27rjs), s G [0, 1], j G N (3.1) 

and that the corresponding strictly positive, possibly not ordered eigenvalues satisfy 

Ai = J c(s)ds, \2j = A2J+1 = J cos (2irjs)c(s) ds, j G N. (3-2) 

The eigenfunctions are thus known to the statistician and only the eigenvalues depend on 
the unknown covariance function c(-), i.e., have to be estimated. Therefore, we suppose in 
this section that the pre-specified basis {ipj,j ^ 1} is given by the trigonometric functions. 
Notice that in this situation for each m 1 the matrix [T cov ] m is diagonalized with diagonal 
entries [T cov ]jj = Xj, 1 ^ j ^ m. 

Definition of the estimator. Since {[T cov ]jj,tfjj , j ^ 1} provides a spectral decomposi- 
tion of the covariance operator T cov defining the normal equation (1.2). It follows that the 
linear functional £h(P) with given representer h can be rewritten as follows 

00 

40?) = (A h) = X>M T covg[<?b with [<?], = (g, ^) and [h} 3 = (h, ^), j > 1. (3.3) 
j'=i 

It is well-known that even in case of an a-priori known sequence of eigenvalues ([T CO v]j,j)j^i 
replacing in (3.3) the unknown function g by a consistent estimator g does in general not lead 
to a consistent estimator of £h(P)- Therefore, a regularization step is necessary. We follow 
the approach presented in Johannes [2009b] (there the objective has been the estimation of (3 
itself). That is we introduce a dimension reduction together with an additional thresholding 
in the Fourier domain. To be more precise, we replace the unknown quantities [g]j and 
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[^cov]j,j i n equation (3.3) by their empirical counterparts. That is, if (Yi,X±), . . . , (Y n ,X n ) 
denotes an i.i.d. sample of (Y, X), then we consider the estimator 

1 n i n 

g:=-Y j Y i X h and f mv := - V(-, X^X* (3.4) 

i=l i=l 

for g and T cov respectively. The orthogonal series estimator of the linear functional lh(fl) 
with given representer h is then defined by 

m 

4 := T,[h]j ■ [Tcov]^ 1 • Mj • l{[f cov kj > l/a} (3.5) 

where the dimension parameter m = m{n) and the threshold a = a{n) have to tend to 
infinity as the sample size n increases. Note that we introduce an additional threshold a 
on each estimated eigenvalue [T cov ]jj, since it could be arbitrarily close to zero even in 
case that the true eigenvalue [T cov ]jj is sufficiently far away from zero. Thresholding in 
the Fourier domain has been used, for example, in a deconvolution problem in Mair and 
Ruymgaart [1996], Neumann [1997] or Johannes [2009a] and coincides with an approach 
called spectral cut-off in the numerical analysis literature (c.f. Tautenhahn [1996]). 



Consistency. The next assertion summarizes minimal conditions to ensure consistency 
of the estimator £^ defined in (3.5). 

Proposition 3.1. Assume an n-sample of (Y,X) satisfying (1.1) with a > 0. Consider 
the estimator £h with threshold m := m(n) and parameter a := a(n) satisfying 1/m = o(l) ; 
l/a = o(l) and a 2 /n = o(l) as n — > oo. If in addition X E X„ and e G r] ^ 1, then we 
have for all h, (3 G L 2 [0, 1] that E\£ h - l h {f3)\ 2 = o(l) as n -> oo. 

Remark 3.1. It is worth noting that the last result states consistency of the estimator £^ 
without any additional restriction than square integrability on both the slope parameter 
and the representer. □ 

The upper bound. In the last assertion we have shown that the estimator defined in 
(3.5) is consistent without additional regularity conditions. However, if these conditions are 
given through ellipsoids and J-"^ for the slope function and the representer respectively 
and a link condition My for the covariance operator. Then the next theorem states that the 
rate max(<5*, 1/n) of the lower bound given in Theorem 2.1 provides up to a constant also 
an upper bound of the risk of the estimator i^. Therefore the rate max(<5*, 1/n) is optimal 
and hence £h is minimax-optimal. 

Theorem 3.2. Assume an n-sample of (Y,X) satisfying (1.1) with a > 0. Suppose that 
the regressor X is second order stationary with associated covariance operator T cov G Af^, 
d ^ 1. Let m* := m*(n) and 5* := 5*(m*) such that (2.4) holds for some A ^ 1. Consider 
the estimator £h with m := and a := nmax(l, 2c2A/7 m< J. If in addition X G ?Cl and 
£6^, k ^ 8 then for some generic constant C > we have 

sup sup E|4-4(#)| 2 Dd 5 A 3 A 2 pT?][pdA + a 2 } maxfj;,^ 1 ) 
for all sequences 7, u and v satisfying Assumption 2.1. 



7 



Remark 3.2. It is worth to note that the bound in the last result is nonasymptotic. Fur- 
thermore, from Theorem 2.1 and 3.2 follows that for all sequences 7, u and v satisfying 
the minimal regularity conditions summarized in Assumption 2.1 the estimator attains 
the optimal rate max(J*,ra _1 ) and hence is minimax-optimal. We shall emphasize the in- 
teresting influence of the sequences 7, u> and v. As we see from Theorem 2.1 and 3.2, if 
the sequence v decreases more quickly to zero then the obtainable optimal rate of conver- 
gence decreases. On the other hand, a faster increasing sequence 7 or w leads to a faster 
optimal rate. In other words, as expected, values of a linear functional given by a slope 
function or representer satisfying a stronger regularity condition can be estimated faster. 
Moreover, independent of the imposed regularity assumption on the slope parameter (even 
7 = 1 is possible) the parametric rate n^ 1 is obtained if and only if the sequence (i^jVj)j^i 
is bounded away from zero. Note further if the sequence 7 increases then in Theorem 3.2 
for all large enough n the threshold a = n is used to construct the estimator l^. On the 
other hand the choice of the dimension m depends on the sequences 7 and v characteriz- 
ing the regularity conditions imposed on the slope parameter and the covariance operator 
respectively which are in practise not known. Building data driven rules that can permit 
to choose automatically the value of m is certainly a topic that deserves further attention 
and one promising direction is to adapt the selection technique proposed in Efromovich and 
Koltchinskii [2001], Goldenshluger and Pereverzev [2000] or Tsybakov [2000]. □ 

3.1 The finitely and infinitely smoothing case. 

In the rest of this section we shall describe the prior information about the unknown slope 
function (3 and the given representer h by their level of smoothness. Therefore, let us 
introduce the Sobolev space of periodic functions W r , r ^ 0, which for integer r is given by 

W r = {f£H p : /0(O) = /%), j = 0, 1, . . . ,r - l}, 

where H r := {/ G L 2 [0, 1] : absolutely continuous , /M G L 2 [0, 1]} is a Sobolev space. 

Furthermore, consider J- W r given in (2.1) with weight sequence w\ = 1, wj = \j\ 2 , j ^ 2. 
Then it is well-known that the subset J- W r coincides with the Sobolev space of periodic 
functions W r (c.f. Neubauer [1988a,b], Mair and Ruymgaart [1996] or Tsybakov [2004]). 
Therefore, let us denote by Wf. := J r ^ r -> O an ellipsoid in the Sobolev space W r . We use 
in case r = again the convention that Wr denotes an ellipsoid in L 2 [0, 1]. In the rest of 
this section we consider the Sobolev ellipsoid Wj?, p ^ 0, and WJ, s ^ 0, as class of slope 
parameter and representer respectively. To illustrate the previous results we consider two 
special cases describing a "regular decay" of the unknown eigenvalues of T cov . Precisely, we 
assume in the following the sequence v to be either polynomially decreasing, i.e., v\ = 1 
and Vj = \j\~ 2a , j ^ 2, for some a > 1/2, or exponentially decreasing, i.e., v\ = 1 and 
Vj = exp(— |j| 2a ), j ^ 2, for some a > 0. In the polynomial case easy calculus shows 
that a covariance operator T cov G acts like integrating (2a)-times and hence it is called 
finitely smoothing (c.f. Natterer [1984]). Furthermore, since the eigenfunctions of T cov are 
{4>j} it follows that T cov G Afy holds if and only if the eigenvalues [T cov ]jj of T cov satisfy 
Pcov]j,j ~d |j| -2f \ which is the usual case considered in the literature (c.f. Crambes et al. 
[2009] or Hall and Horowitz [2007]). On the other hand in the exponential case it can easily 
be seen that the link condition T cov G implies 1Z(T cov ) C W p for all p > 0, therefore the 
operator T cov is called infinitely smoothing (c.f. Mair [1994]). Moreover, T cov G Af^ holds if 
and only if the eigenvalues [T cov ]jj of T cov satisfy [T cov ]jj x rf exp(— j 2a ) by using that {V'j} 
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are the eigenfunctions of T cov . Since in both cases the minimal regularity conditions given 
in Assumption 2.1 are satisfied, the lower bounds presented in the next assertion follow 
directly from Theorem 2.1. Here and subsequently, we write a n < b n when there exists 
C > such that a n ^ Cb n for all sufficiently large n E N and a n ~ b n when a n < b n and 
b n ^5 a n simultaneously. 

Proposition 3.3. Under the assumptions of Theorem 2.1 we have for any estimator £ 

(i) in the polynomial case, i.e. v\ = 1 and Vj = \j\~ 2a , j ^ 2, for some a > 1/2, that 

su P/3eVV PSup, GVVJ {E|^-4(/3)| 2 } ^max^l^/^.n- 1 ), 
(^iij in the exponential case, i.e. v\ = 1 and = exp(— |j| 2a ), j ^ 2, for some a > 0, i/iai 
su P/3eW PSup heWj {E|£-4(/3)| 2 } > (logn)-(^)/ a . 

On the other hand, if the dimension m and the threshold a in the definition of the 
estimator £^ given in (3.5) are chosen appropriate, then by applying Theorem 3.2 the rates 
of the lower bound given in the last assertion provide up to a constant also the upper bound 
of the risk of £h, which is summarized in the next proposition. We have thus proved that 
these rates are optimal and the proposed estimator £^ is minimax-optimal in both cases. 

Proposition 3.4. Under the assumptions of Theorem 3.2 consider the estimator £ \ 

(i) in the polynomial case, i.e. v\ = 1 and Vj = \j\~ 2a , j ^ 2, for some a > 1/2, with 
dimension m ~ n 1 /( 2 P+ 2a ) and threshold a ~ n. Then 

su P/3eW P su PheWI {E\£ h - 4(/?)| 2 } < max^-W/^),^ 1 ), 

(ii) in the exponential case, i.e. v \ = 1 and Vj = exp(— |j| 2a ), j ^ 2, for some a > 0, with 
dimension m ~ (logn) 1 /( 2a ) and threshold a ~ n. Then 

su P/3eW P S u VheWI {E\£ h - 4(/?)| 2 } < (logn)-(f+ s )/-. 

Remark 3.3. We shall emphasize the interesting influence of the parameters p, s and a 
characterizing the smoothness of (5, h and the decay of the eigenvalues of T cov respectively. 
As we see from Proposition 3.3 and 3.4, if the value of a increases the obtainable optimal 
rate of convergence decreases. Therefore, the parameter a is often called degree of ill- 
posedness (c.f. Natterer [1984]). On the other hand, an increasing of the value p + s 
leads to a faster optimal rate. In other words, as expected, values of a linear functional 
given by a smoother slope function or representer can be estimated faster. Moreover, in the 
polynomial case independent of the imposed smoothness assumption on the slope parameter 
(even p = is possible) the parametric rate n _1 is obtained if and only if the representer is 
smoother than the degree of ill-posedness of T cov , i.e., s ^ a. The situation is different in 
the exponential case. As long as the representer h is only finitely times differentiable, then 
due to Proposition 3.3 and 3.4 the optimal rate of convergence is logarithmic. However, if 
we restrict the class of representers even more, e.g. by considering with weights uj\ := 1, 
ojj = ex.p(\j\ 2q ), j ^ 2, which contains only analytic functions given q > 1 (c.f. Kawata 
[1972]). Then faster rates are possible. Again independent of the imposed smoothness 
assumption on the slope parameter (again p = is possible) the parametric rate n" 1 is 
obtained if and only if the representer h is smoother than the degree of ill-posedness of 
T cov , e.g., q ^ a. Finally, in opposite to the polynomial case in the exponential case the 
smoothing parameter m does not depend on the value of p. It follows that the proposed 
estimator is automatically adaptive, i.e., it does not depend on an a-priori knowledge of 
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the degree of smoothness of the slope function (3. However, the choice of the smoothing 
parameter depends on the smoothing properties of T cov , i.e., the value of a. □ 

Remark 3.4. There is an interesting issue hidden in the parametrization we have chosen. 
Consider a classical indirect regression model given by the covariance operator T cov and 
Gaussian white noise W, i.e., g n = T cov f3 + n~ x l 2 W (for details see e.g. Hoffmann and 
Reifi [2008]). If T cov is finitely smoothing, i.e., v\ = 1 and Vj = \j\~ 2a , j 2, then it is 
shown in Johannes and Kroll [2009] that the optimal rate of convergence over the classes 
Wp and WJ of any estimator of £h(P) is of order max(n~( p+s ) // ( p+2a ), n~ l ). However, from 
Proposition 3.3 and 3.4 follows that in a functional linear model the optimal rate is of order 

Thus comparing both rates we see that in a functional linear model 
the covariance operator T cov has the degree of ill-posedness a while the same operator has in 
the indirect regression model a degree of ill-posedness (2a). In other words in a functional 
linear model we do not face the complexity of an inversion of T cov but only of its square root 
Tcov • The same remark holds true in the exponential case. But, the rate of convergence is 
the same as in an indirect regression model with Gaussian white noise (c.f. Johannes and 
Kroll [2009]). This, however, is due to the fact that in case Vj x rf exp(— r\ j\ 2a ), j E N, 
for some r > 0, the dependence of the rate of convergence on the value r is hidden in the 
constant (a more detailed discussion can be found in Johannes [2009b]). □ 



4 Optimal local estimation in the general case. 

In this section the pre-specified basis {ipj} corresponds not necessarily to the set of eigen- 
functions of T cov . In this situation for each m 1 the matrix [T cov ] m will be in general no 
longer diagonalized. Nevertheless, the estimator proposed below is also based on a dimen- 
sion reduction together with an additional thresholding. That is, if g and T cov denote the 
estimator of g and T cov respectively given in (3.4), then the general estimator of the linear 
functional £h(P) is now defined by 

2 h ■= / MmP^ovJm 1 idlm, if [f CO v}m is nonsingular and || [Tcov]™ 1 1| ^ «, ^ ^ 

\ 0, otherwise, 

where the dimension parameter m = m(n) and the threshold a = a(n) again have to tend 
to infinity as the sample size n increases. In fact, the general estimator is obtained from 
the linear functional £h(P) by replacing the unknown slope parameter (3 by an estimator 
proposed by Cardot and Johannes [2008] , which takes its inspiration in the linear Galerkin 
approach coming from the inverse problem community (c.f. Efromovich and Koltchinskii 
[2001] or Hoffmann and Reifi [2008]). 



Consistency. The next assertion summarizes minimal conditions to ensure consistency 
of the estimator 1^ introduced in (4.1). 

PROPOSITION 4.1. Assume an n-sample of (¥,X) satisfying (1.1) with a > 0. Let X G 

and e 6 £®, rj ^ 1. Consider £h defined with dimension m := m(n) and threshold a := a{n) 
satisfying a ^ 2||[T C0V ]~ 1 || and as n — > oo that 1/m = o(l), a/n = o(i), m s /n = O(l) 
and (m 2 a 2 )/n = 0(1). If in addition sup mgN ||(T cov )~ 1 n m T cov n^|| < oo, then we have 
E|4 - 4(/3)| 2 = o(l) as n -> oo. 
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In Proposition 4.1 consistency is only obtained under the additional condition 
sup meN ||(T cov )^ n 1 II fn T cov n^ l || < oo, which is known to be sufficient to ensure L 2 -convergence 
of the Galerkin solution given by (3 m = ^Jl 1 [/3 m ]jV'j with [/3 m ,]m = Pcov]m bUm. *° ^ ne s l°P e 
parameter (3 as m — > oo. However, this condition is automatically fulfilled if the operator 
T cov satisfies a link condition, i.e., T cov E which is summarized in the next assertion. 

Corollary 4.2. Assume an n-sample of (Y,X) satisfying (1.1) with a > and associated 
covariance operator T cov E Af^, d ^ 1. Consider the estimator £^ with threshold a = 8d 3 jv m 
and dimension m := m(n) chosen such that 1/m = o(l), l/(nv m ) = o(l), (m 3 /n) = 0(1) 
and m 2 /(f^j n) = 0(1) as n — ► oo. If in addition X E and e E £~, rj 1, i/ien we /iaue 
E|4-4(^)| 2 =o(l) asn^oo. 



The upper Bound. The last assertions show that the estimator 4 defined in 4.1 is 
consistent without any additional regularity conditions on slope function and representer. 
The following theorem provides an upper bound if these conditions are given again through 
ellipsoids J-j and J 7 ^ for the slope function and the representer respectively together with a 
link condition A/jf for the covariance operator. But in opposite to the case of second order 
stationary regressor considered in the last section the following additional properties of the 
sequences 7, u> and v are needed. We suppose that for some k E N 

2k — k l+k/2 sj-fr 3 

O(l), ^ TFr = 0{l\ ^- = 0(1), ^ = 0(1) as 00, (4.2) 



max(o~*,l/n) ' n k / 2 ' n k l 2 ~ l ' n 

where m, := m*(n) and o~* := <5*(m*) are given by (2.4). The next theorem states that in 
this situation the rate max(<5*, 1/n) of the lower bound given in Theorem 2.1 provides up 
to a constant also an upper bound of the general estimator £ defined in (4.1). Thus we have 
proved that the rate max(<5*, 1/n) is optimal and hence £ is minimax-optimal. 

Theorem 4.3. Assume an n-sample of (Y,X) satisfying (1.1) with a > and associated 
covariance operator T cov E Afff, d ^ 1. Suppose that the sequences 7, u and v satisfy 
Assumption 2.1 and condition (4.2) for some k ^ 3. Let := m*(n) and (5* := <5*(m*) 
such that (2.4) ZioWs /or some A ^ 1. Consider 4 defined with dimension m := m* and 
threshold a := nmax(l,8d 3 A/7 m „). 7/ m addition X E and e E <£^ fc , £/ien /or some 
generic constant C > we have 

sup sup E|4 -4(/?)| 2 < CA 3 d 11 Z?r ? pr{CT 2 + pd 5 A/ 7m , + 1} max(o";, n" 1 ). 



Remark 4.1. We shall stress that the bound in the last theorem is again nonasymptotic. 
Moreover, it is worth to note that if the sequence 7 increases then the condition on the 
threshold writes a = n for all sufficiently large n. Therefore, also in the general case only 
the dimension m has to be chosen data-driven in order to build an adaptive estimation 
procedure. Furthermore, even in case 7 = 1, i.e., (5 is only assumed to be square integrable, 
the upper bound still tends to zero as long as the sequence u is increasing. Moreover, 
the proposed plug-in estimator attains again the parametric rate under the conditions of 
Theorem (4.3) if and only if the sequence {uojVj)j^i is bounded away from zero. Note 
furthermore, if the eigenfunctions of the operator T cov are given by then T cov E N$ 

holds if and only if the corresponding eigenvalues Aj = (T cov ipj,ipj), j ^ 1, satisfy Xj Vj. 
Hence, in this situation the optimal rate obtained in the last assertion equals the rate in 
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Theorem 3.2. However, the set JVjj contains also operators with eigenfunctions not given 
by {ipj}- Then their corresponding eigenvalues may decay far slower than the sequence of 
weights v. Hence, by using a projection onto the basis {ipj} instead of their eigenfunctions, 
the obtainable rate of convergence given in Theorem 4.3 may be far slower than the rate 
given in Theorem 3.2. However, the rate in Theorem 4.3 is optimal, and thus cannot be 
improved without additional information. □ 

The finitely and infinitely smoothing case. In the rest of this section the basis 
is again given by the trigonometric functions defined in (3.1). But in opposite to Section 
3.1 this basis corresponds not necessarily to the set of eigenfunctions of T cov . Furthermore, 
we consider also the Sobolev ellipsoids Wp, p ^ 0, and WJ, s ^ 0, for the slope parameter 
and the representer respectively. In the following theorem we illustrate the general result 
obtained for the estimator 4 defined in (4.1) by considering again the finitely and infinitely 
smoothing case presented in Section 3.1. 

Proposition 4.4. Under the assumptions of Theorem 4-3 consider the estimator i ^ 

(i) in the polynomial case, i.e. v\ = 1 and Vj = \j\~ 2a , j ^ 2, for some a > 1/2, with 
m ~ n 1 /( 2 P+ 2a ) anc l threshold a ~ n. If in addition k ^ 12 and (p + a) ^3/2 then 

su P/3eVV P iheWj {E|4 - W)\ 2 } < maxtn-^)/^),™- 1 ), 

(ii) in the exponential case, i.e. v± = 1 and Vj = exp(— |j| 2a ), j ^ 2, for some a > 0, with 
m ~ (logn) 1 ^ 20 ) and threshold a ~ n. If k ^ 8 then 

sup 0eW P i/ieWJ {E|4 - 4(/?)| 2 } < (logn)-(P+*)A\ 

Remark 4.2. The last assertion shows that under stronger moment conditions, e.g. k ^ 12 
in case (i), and additional restrictions on the parameter a and p, e.g. {p + a) ^ 3/2 in case 
(i), the rate of the lower bound over the Sobolev ellipsoids Wp and WJ (see Proposition 
3.3) provides up to a constant also an upper bound of the estimator 4 for both a finitely 
and an infinitely smoothing covariance operator. Thereby, this rate is also in case of un- 
known eigenfunctions optimal and hence 4 is minimax-optimal. Furthermore, the findings 
discussed in Remark 3.3 and 3.4 still apply here. □ 

A Appendix 

A.l Proofs of Section 2. 

Consider the covariance operator T cov associated to the regressor X, then E[X] 2 = (T cov ipj,ipj) , 
j G N. Therefore, if the link condition (2.2), i.e., T cov G My, is satisfied, then it follows that 
EpT] 2 Vj, for all j G N. This result will be used below without further reference. We 
shall prove at the end of this section the technical Lemma A.l used in the next proof. 

Proof of the lower bound. 

Proof of Theorem 2.1. Let Xi, i G N, be i.i.d. copies of X. Consider independent error 
terms £j ~ AA(0, 1), i G N, which are independent of the random functions {Xi,i G N}. 
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Then we prove for any estimator I the following lower bounds: 

sup E|l-4(/3)| 2 > min <5*, (A.l) 



<x 2 1 



sup E|£-4(/?)r ^ t min — ,p -. (A.2) 
peJ%,h&Z 4 [2d J n 

The result follows then by combination of these two lower bounds. 

Proof of (A.l). Define the slope function /?* := [/3*] m * VW) where [/3*] m * is given in 
(A. 7) (Lemma A.l) and ?n* := m*(n) £ N satisfies (2.4) for some A 1. Then by using 
(A. 9) in Lemma A.l we have /3* E Consider the two slope functions (3g := 9 [3* S 
6 € { — 1, 1}. Then, for each 6 the random variables (Yi, JQ) with Yi := J Q 0g(s)Xi(s)ds+a£i, 
i = 1, ... ,n, form a sample of the model (1.1) and we denote its joint distribution by P#. In 
case of Fg the conditional distribution of Yi given is Gaussian with mean 0[/3*] m „ [-Xi]m» 
and variance a 2 . Thereby, it is easily seen that the log-likelihood of P_i with respect to Pi 
is given by 



/ f/P \ 1 n n n 



l 2 

i=l " i=l 



and hence its expectation with respect to Pi satisfies 

E Pl [log(dP_i/dPi)] = -(2n/a 2 ) [p^iEiX^ > -{2dn/a 2 ) [&]^ v m ,. 

In terms of Kullback-Leibler divergence this means KL(Pi,P_i) ^ (2dn/a 2 ) %, . 

Since the Hellinger distance H(Pi,P_i) between Pi andP_i satisfies H 2 (Pi, P_i) < KL(Pi,P_i) 
it follows from (A. 9) in Lemma A.l that 

H^P-i) < ^ • [(3*] 2 mt ■ v m , < 1. (A.3) 



Consider the Hellinger affinity p(Pi,P_i) = J ^ K /a¥ldFZl then we obtain for any estimator 
£ and for all h 6 that 

Due to the identity p(Pi,P_i) = 1 - 4 H 2 (Pi,P_i) combining (A.3) with (A.4) yields 

{E P jl-4(/rf+E P _jl-4(/?-i)| 2 } > ^I4(/5*)| 2 - (A.5) 

Consider now the representer h* := [/i*] m „ ip m * , where [/i*] mt is given in (A. 7) (Lemma A.l). 
Then by construction h* G JTJJ and |4„(/?*)| 2 = [^*] 2 n , [/^*]m, • F rom (A.5) we conclude then 
for each estimator £ that 

sup n£-£ h {{3)\ 2 > sup E Pe \£-£ h M)\ 2 

^ I{e Pi |1-4,(A)I 2 + e f _ 1 |£-4,(/3-i)I 2 } 

> i^iLwi, > ^ min {id' a} 
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where the last inequality follows from (A. 9) in Lemma A.l, which proves (A.l). 

The proof of (A. 2) is similar to the proof of (A.l), but uses (A. 8) in Lemma A.l rather 
than (A. 9). Precisely, we consider the slope function /?* := and the representer 

h* := [MiV'i with and [h*]i given in (A. 6) (Lemma A.l). Then by following along 
the same lines as in the proof of (A.l) we obtain (A. 2), which completes the proof. □ 



Technical assertion. 

Lemma A.l. Let m* := m.(n) and 5* := <5*(m.) be given by (2.4) with A ^ 1. If we define 



[h*\l-=T, [&]i := -, with C := min{^-,p[> , (A.6) 



n 



2 



2d" 



[Mm. ■= — — and [P*? m , ■= — ^— > where £ := min {^-, -^-) . (A.7) 
Then under the Assumption 2.1, i.e., 71 = u\ = v\ = 1, we have 

n < 1, 71 «S P, [MlM > t min(j-,pl -, (A.8) 



" I/ 3 *]™, %, «S 1, [&]m. 7m. < P and [h*\l^ [/3*] m „ > -J- mini }> <£■ (A.9) 



Proof. We only prove (A.9). The proof of (A.8) follows in analogy and we ommit the 
details. The first inequality in (A.9) follows trivially by using the definition of £, while 
the definition of to* given in (2.4) implies the second, i.e., [/?*]„.» 7m, ^ ^7m,/(™ m .) ^ 
£ A p. To deduce the third estimate from the definition of to* and <5* observe that 
[Mm. [&]m* = T bnilrn*/{ nv m*) ^ T ^n^/^i which proves the lemma. □ 

A. 2 Proofs of Section 3. 

We begin by defining and recalling notations to be used in the proofs of this section. Since 
the eigenfunctions of T cov associated to the regressor X are given by the basis {ipj} it follows 
that the values Xj := E[X]| = P^ov]^ are the corresponding eigenvalues. Moreover, T cov 
satisfies the link condition (2.2), i.e., T cov £ JVjf, if and only if Xj x rf Vj, for all j G N. 
Thus, if T cov e then E||X|| 2 < dA by using Assumption 2.1. These results will be used 
below without further reference. Furthermore, given independent and identically distributed 
(i.i.d.) copies (Y,,Xj), 1 ^ i ^ n, of (V, X) we use for all j€N the notations 



[Xi]j = (X h [/% = </?, Vi), = (h, [Zi]j := [Xilj/y/Xj 



^ 1 n 

A,- := [Toovljj, r nJ := - ^(y, - [x^nzih = ([?]; - (a.io) 

1=1 

We shall prove in the end of this section a technical Lemma A. 2 used in the following proofs. 
Proof of the consistency. 

Proof of Proposition 3.1. Let If := EJliWj \P\i l{Ptov]jj > V«}- Then the P roof 
is based on the decomposition 

E|4 - 4(/?)| 2 ^ 2{E|4 - CI 2 + E|C - 4(/?)| 2 }. (A.ll) 
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We show below for some generic constant C > the following bound 

E|4-CI 2 < C7]{a 2 /n) |H| 2 E||A|| 2 {<t 2 + ||/3|| 2 E||X|| 2 }, (A.12) 

while we conclude from Lebesgue's dominated convergence theorem 

Wh - 4(/?)| 2 = o(l) provided 1/m = o(l) and 1/a = o(l) as n -> oo. (A.13) 

Thereby, the conditions on a and m ensure the convergence to zero as n — ► oo of the two 
terms on the right hand side in (A. 11), which gives the result. 

Proof of (A.12). By making use of the notations given in (A. 10) it follows that 



m-m 2 < ini 2 f>( — jr^ i& > v«}) < ii^f^EA.Eir^f 

and hence (A.12) follows by using (A. 18) in Lemma A. 2 together with X^jli -\j ^ ^ll^|| 2 - 
Proof of (A.13). By making use of the relation 

m 

where | ^j>m[^]j = as ?n — > oo due to Lebesgue's dominated convergence theorem 
and 



2 



E £Mi 1/3], 1{A, < 1/a} < IN| 2 • J>]f • < V«) < ll^ll 2 • II/3H 2 < °°- 

Thus Lebesgue's dominated convergence theorem implies the result since for each j £ N 
P(Aj < 1/a) = o(l) as n — > oo, which can be realized as follows. By using that 1/a = o(l) 
there exists rij > such that for all n ^ rij it holds \j 2/ a and hence P(A,- < 1/a) ^ 
F(\j/Aj < 1/2) together with (A. 20) in Lemma A. 2 implies the assertion, which completes 
the proof. □ 

Proof of the upper bound. 

Proof of Theorem 3.2. Consider the decomposition (A. 11), then we show below under 
the condition e G £®, X G and Xj ^ 2/a, 1 ^ j ^ m, for some generic constant C > 
the following two bounds 

m rii2 

E|4-d 2 < C^ [ ^}dk{d 2 a 2 /n 2 + n- 1 v^)}r 1 {\\l3\\ 2 dk + <j 2 } (A.15) 

- W)\ 2 < C |H| 2 ||/3|| 2 n- 1 r? + ^S" 1 ||/3|| 2 \\h\\l (A.16) 

Consider m* := m*(n) given in (2.4), i.e., 7 m ,/( nt; mt) ^ ^ with A ^ 1. Then under 
Assumption 2.1 the conditions m = m* and a = nmax(l, 2dA/^f mt ) imply together a/n ^ 
2d A, l/(nv m ) ^ A, Aj ^ 2/a, 1 ^ j ^ m by using Aj x d ty. Consequently, for all f3 G J 7 ^ 
and h G .FJJ we have 

E|4-4(/?)| 2 ^Cjn" 1 sup {u;7 1 t ;- 1 } a ! 5 A 2 A 2 + u;-l7-l + l/n}prr ? [p(iA + cT 2 ]. 
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Now under Assumption 2.1, i.e., supi^j^ rrit {ujj 1 vj 1 } ^ Du~|max(a)^,D ro ,), follows 
n~ l sup 1 ^-^ rns {ujJ 1 Vj~ 1 } ^ D A max(<5* , 1/n) by using the definition of given in (2.4) 
and <5* = w~J:7~J, which implies the result. 

Proof of (A. 15). By using the notations introduced in (A. 10) we obtain the identity 

4 - C = E % • T nd ■ {Xj/Xj - 1} • t{Xj > l/a} 

f;^.T nj -l{A,<l/a} + f;%-r nj =:5 1 + 5 2 + 53 
A -j . , a/ A; 



i= l V J j=i V 

where we show below that each term E|Si| 2 , EjS^I 2 and E | ^3 1 2 is bounded by 
m rii2 m \ 2 2 m 1 

c{Z l 4}{Z^+Zl}-*-mMixe+s}. (A.i7) 

for some constant C > 0. Consequently, the inequality (A. 15) follows from (A. 17) by using 
Vj Xj for all j E N and 'Ylj v j = A- Consider 5i. First by using the Cauchy-Schwarz 
inequality together with the elementary inequality 1/2 ^ \Xj/Xj — 1| 2 + | A^- / A ^ | 2 we obtain 

m r 7 1 2 m m 

e|5x| 2 - i{Eir mTnA') 112 } {J2 x h 2 - ii 8 ) 1/2 +E( E fcA,- - ii 4 ) 1/2 }- 
j=i 1 j= i j= i 

Thereby, the bound (A. 17) follows from (A. 18) and (A. 19) in Lemma A. 2. Let us evaluate 
£2- The Cauchy-Schwarz inequality together with Xj ^ 2/ a for all j = 1, . . . , m implies 



2 <2<fv!^ 



EIS2I «S 2 



lZ 1 A i 

.7=1 J 



rn 

^l^l 4 ) 1 / 2 } {^IFCA.VA,- < 1/2)! 1 / 2 }. 



3=1 



We thus get the bound (A. 17) by using (A. 18) and (A. 20) in Lemma A. 2. Consider 53. 
Define V 2 := [h]]/Xj and s G K m with Sj := [h]j / (V . Clearly s G S m and 

53 = V YljLi s j T n ,j- Consequently, the bound (A. 17) follows from (A. 18) in Lemma A. 2. 

The proof of (A. 16) is based on the identity (A. 14) where we again bound each summand 
separately. First, by using the Cauchy-Schwarz inequality we conclude | ^2j >m [h]j [/%| 2 
^mWm 1 !!/^! 2 /!!^!! 2 - O n the other hand, applying the Cauchy-Schwarz inequality together 
with Xj ^ 2a for all j = 1, . . . , m implies 



m 2 m m 

e|j>l, ift < «}| < {E^}{E^ 2p (^/ A i < V2)} < c 



2 n 1 77 



3=1 J'=l 



for some C > 0, where the last inequality follows from (A. 20) in Lemma A. 2. Combining 
the two bounds we obtain (A. 16), which completes the proof. □ 



The finitely and infinitely smoothing case. 

Proof of Proposition 3.3. Observe that Wp = T? t and WJ = with 7 = (77)^1 
and co = (ojj)j^i given by 71 := 1,7,- := \j\ 2p and u\ := 1, Uj := |j| 2s , j ^ 2, respectively. 
Obviously, the sequences 7, w and t> given in (i) by v = 1, = |j|~ 2a and (ii) by t> = 1, Vj = 
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exp(— |j| 2a ), j ^ 2, satisfy Assumption 2.1. Furthermore, in case (i) we have l/(7m,^m*) = 
m 2a+2p . It follows that m* and <5* given in (2.4) of Theorem 2.1 satisfies ~ n 1 ^ 2p+2a ^ and 
<5* ~ n - ( p+s )/( p+a ) respectively. On the other hand, 1/ (7 m „ ^m, ) = m 2p exp(m 2a ) implies in 
case (ii) that m* ~ (log n) 1 ^ 2 ^ and 5* ~ (log n) < - p ~ s ^ a . Consequently, the lower bounds in 
Proposition 3.3 follow by applying Theorem 2.1. □ 

Proof of Proposition 3.4. Since the condition on m and a ensures in both cases that 
m ~ m* and a ~ n (see proof of Proposition 3.3) the result follows from Theorem 3.2. □ 



Technical assertion. 



Lemma A. 2. Suppose X G and e G £ 4 , k G N. T/ien /or some constant C > on/y 
depending on k we have 



2k , / „\ fc 



sup sup eIVsj ^ Cn- fc (||/3|| 2 E||A|| 2 + o- 2 ) 77, (A.18) 



i=i 

1 2fc ^ /~f _ —A: , 



supE|Aj/Aj - ir ^ Cn~ K r], (A.19) 

sup P(Aj /Xj < 1/2) < C n~ k n. (A.20) 

i&i 

Proof. Let s £ S m and denote := ((3,Xi) - [P]j[Xi]j. Then by the definition of T n j 
given in (A. 10) we have 



m ^ n m 



j=i i=i j=i 



where we bound below each summand separately, that is 

E|Si| 2fc ^ C ■ n~ k ■ ||/3|| 2fe • (E||X|| 2 ) fc • rj, (A.21) 



E\S 2 r ^ C ■ n~ k ■ a 2k ■ r] (A.22) 

for some C > only depending on k. Consequently, the inequality (A.18) follows from 
(A.21) and (A.22). Consider S±. The random variables • s j[Zi]jd,j)i^i^n, are i.i.d. with 
mean zero. From Theorem 2.10 in Petrov [1995] we conclude E|Si| 2fc ^ Cn~ k K\ ^ ■ Sj[Zi]j(^±j\ 2k 
for some constant C > only depending on k. Then we claim that (A.21) follows in case of 
5i from the Cauchy-Schwarz inequality together with X\ S X^ k , i.e., E| Y^j=i s j[ z i]j\ 4k ^ f] 
and sup; gN E|[Zi];| 4fc ^ r\. Indeed, we have 



E| jS^Ki,;! 2 * *S 2 2fe ~ 1 j (E| (f3, Xi) | 4fc ) 1/2 (E| ^ sj [Zi]^) 1 / 2 
3=1 3=1 



ni m 

(E|^[/3] 2 [A 1 ] 2 | 2fc ) 1 / 2 (E|^ S 2 [Z 1 ] 2 | 2fc ) 1 / 2 } 



j'=i i=i 
i4fc ^ ii/3ii4fev^ \ \ "rr 2 fe ir-71 12 ^ n on4fc nun -^n2\2fc , 



where E|(/3, Ad)] 4 * < ||/3|| Eji A ii ' ' ' Ej 2fc A j 2fe Ilz=i \[ z i]ji\ < ||/3|| 4fe (E||X|| 2 ) 2 S by us- 

r=i/?[^]?i " 



ing E||X|| 2 = 53. Xj and in an analogous manner E| £™ f3 2 AXiW k ||/3|| 4fc (E||X|| 2 ) 2fc 
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and E| Y^Li s 2 [Zi] 2 \ 2k s$ r]. Consider S 2 . (A.22) follows in analogy to the case of Si, be- 
cause (%2j Sj[Zi]jei)i<^i^. n are i.i.d. with mean zero. Note that X\ E X^ k and e G imply 
together E| J^j s j [Z 1 ] j ei\ 2k < o- 2fc 77. 

Proof of (A. 19). From the identity Aj/Aj = (1/n) X^J^dy tne result follows by applying 
Theorem 2.10 in Petrov [1995], because {[Zi] 2 — l)i^i^« are i.i.d. with mean zero, and 
ElfZikl 4 *^. 

To deduce (A.20) from (A.19) by applying Markov's inequality, take P(\j/\j ^ 1/2) < 
P(|Aj/Aj — 1| ^ 1/2), which proves the lemma. □ 

A. 3 Proofs of Section 4 

We begin by defining and recalling notations to be used in the proofs of this section. Given 
m > 0, (3 m G ^ m denotes a Galerkin solution of g = T cov /3, i.e., 

||<7 - T cov p m \\ < || 5 - T cov /3||, V/3 G * m . (A.23) 

Since T cov is strictly positive it follows that m = [T C ov]^[g]m is the unique Galerkin solution 
satisfying [T cov (/3 — P m )]m = 0. Furthermore, we use the notations 



t 

i\mi 



^ n 1 71 

[2cov]m — / ~] [Xi\m[Xi\ m , [A^]. m . — P~cov]-m ^ [Aj] m , [ T cov \ m . — ^ ^ [Ai] m [A ; 
n i=l n i=l 

j n n 

[^n]m • — [-^covlm Imi [^n]m • — / ^Aj,/? /3 )[Aj] m , [Wn]m • — / , ^i[-^i]mj 
— — — n — — n — 

i=l i=l 
and [Z n ] ffi := [F n ] m + [^Jm = [?]m ~ [f CO v]m[/9m]m, (A.24) 

where E[V^]rn = [T cov (/3 — /3 m )]m = 0, E[W n ]m = 0, hence E[Z n ,]m = 0, and furthermore 

E[T cov ]„j = [T cov ]rn, [T cov ]rn = [T CO v]nl /2 [T C ov}m [T CO v\^ 2 , thus E[E n ] m = 0. Moreover, let 
us introduce the events 

n -.= {Hffeo.L 1 !! ^ «}, n 1/2 : = {|p n y| ^ 1/2} 

W := {ll^covL 1 !! > a} and n$ /2 = {||[H„y| > 1/2}. (A.25) 
Observe that Q1/2 C ^ in case a ^ 2IIJTCOV]" 1 !!. Indeed, if ||[H n ]„j|| ^1/2 then the identity 

» 1 /2 1 /2 

Pcav]m = Pcav]m {I + [H n ]m}[^cav]m implies by the usual Neumann series argument that 
IKfeov]" 1 !! < 2\\[T cov ]^\\. Thereby, if a ^ 2\\[T cov ]^\\, then we have fi 1/2 C 0. These 
results will be used below without further reference. 

We shall prove in the end of this section two technical Lemmata (A. 3 and A. 4) which 
are used in the following proofs. 

Proof of the consistency. 

Proof of Proposition 4.1. Let if := 4(An) lillPlov]™ 1 !! ^ «}• Then the proof is 
based on the decomposition 

E|4 - 4(/?)| 2 < 2{E|4 - C| 2 + E|C - £ h (P)\ 2 }. (A.26) 
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Since a ^ 2||[T cov ] m || it follows that Q c C fi?, 2 and hence 

E|C - 4(/3)| 2 < 2{|4(/3 - An)| 2 + |4(/?m)| 2 P(«; /2 )}. (A.27) 
On the other hand we show below for some constant C > the following bound 

E|4 - CI 2 < C ■ (1/fOPt [TcovL 1/2 || 2 {a 2 + - /3 m || 2 E||X|| 2 } 

{m7 ? -V2 ( p (fi C /2)) l/2 + a 2 m 3 n -l^-l/2 (p( ^ /2)) l/2|| [Tcov y | 2 + (A>2g) 

where by applying Markov's inequality (A. 33) in Lemma A. 3 implies P(£ly 2 ) ^ Cr/m^/n 2 
for some C > 0. Moreover, ||[T cov y| 2 < ||T cov || 2 and [T cov ]^ 1/2 || 2 < a\\h\\ 2 since 

a ^ 2||[T cov ]m 1 ^ 2 || 2 , which by combination of (A.27) and (A. 28) leads to the estimate 



E|4 - 4(/3)| 2 < C[\t h {(5 - f3 m )\ 2 + |4(/?m)| 2 {m A /n 2 ) rj 

+ (a/n)\\h\\ 2 r,{a 2 + \\(3 - /3 m || 2 E\\X\\ 2 } [(a 2 m 2 /n) ||T cov || 2 + l] (m 3 /n)} (A.29) 

for some C > 0. Furthermore, for each /3 G £ 2 [0, 1], we have \\j3 — /3 m || = o(l) as m — ► oo, 
which can be realized as follows. Since ||IL^/3|| = o(l) as m — ► oo by using Lebesgue's 
dominated convergence theorem, the assertion follows from the identity [IT m /3 — ftrn]rn — 
- [Tcov]^ 1 [T cov II^/3]m by using that \\U m f3 - P m \\ < ||n^/3|| sup m ||(T cov )- 1 n m T cov n4|| = 
0(||1T^/3||). Consequently, the conditions on m and q ensure the convergence to zero as 
n — > oo of the bound given in (A.29), which proves the result. 

Proof of (A. 28). From the identity [g]m — [Tcov}m[Pm]m = [Zn]m it follows that 

E|4 - 4(/3)| 2 = Ep^fT^y 1 + [T cov ]^([T cov ] m - [feo,]™)^]^ 1 } [Z n ] m f l n . 

Since 2||[T cov ]~ 1 || ^ a we have ri 1 / 2 C and hence by using llfT^oy]" 1 !! 2 tn ^ a 2 and 
+ [EjmJ" 1 !! 1q 1/2 ^ 2 we obtain 

E|4 - 4(/?)| 2 < 4[E||[C [T cm ]^[Z n ]J\ 2 + ||[^ [Teov]" 1 / 2 !! 2 { 

(e|| [T cov y 1/2 [z n y} || 4 ) 1/2 (P(r2i /2 )) 1/2 

W||[T cov y|| 2 (E||[~ r jyY^ 

+ 4(E||[H n yi 4 )V 2 (E||[r cov ]„ 1 / 2[Znk} ||4 ) i/2j- ; 

where Epy[T cov y[Zjy 2 < || [hf m [T cov ]„ 1/2 || 2 sup zeS ™ E||z* [T cov y 1/2 [Z n y | 2 . From 
(A. 31) - (A. 33) in Lemma A. 3 follows then (A. 28), which completes the proof. □ 

PROOF of Corollary 4.2. Due to (A. 37) in Lemma A. 4 the link condition T cov G JVj? 
implies 2||[T cov ]~ 1 || ^ 8d 3 /v m = a. Thus, from (A.29) in the proof of Proposition 4.1 follows 



E|4 - 4(/9)| 2 < C[M(3 - f3 m )\ 2 + |4(/? m )| 2 {m A /n 2 ) rj 

+ d 3 /(v m n)\\h\\ 2 V {a 2 + ||/3 - /3 m || 2 E||A|| 2 } [(d 6 m 2 /(v 2 m n)) ||T cov || 2 + l] (m 3 /n)} 

(A.30) 
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for some C > 0. Furthermore, the link condition T cov G Af^ implies ||(T cov ) m 1 n. m T cov n^ l || 2 = 
sup|| /3 || =1 ||Il m /3 — (3m\\ 2 ^ 2(1 + d 2 ) for all to G N by using the identity [II m /3 — /3 m ]m = 
— [Tcov]^[T C ov^-mP]m and the estimate (A. 40) in Lemma A. 4. Thus, \\(3 — /3 m || = o(l) as 
m — > oo for all j3 G £ 2 [0, 1]. Consequently, the conditions on to and a ensure the conver- 
gence to zero of the bound given in (A. 30) as n — > oo, which proves the result. □ 



Proof of the upper bound. 

Proof of Theorem 4.3. Our proof starts with the observation that the link condition 
T cov G N% implies 2\\[T cm ]^\\ 8d 3 /v m , || [Diag(u;)]^ 2 [T cov ]„ 1/2 || 2 < M 3 sup^^Jc^/^} 
and (([Tcovjmll 2 ^ d 2 by using the estimates (A. 37), (A. 38) and (A. 39) in Lemma A. 4 respec- 
tively. Hence, under Assumption 2.1 we have ||[^]m[^cov]m 1//2 || 2 ^ 4<i 3 Dtv^ 1 max(u;~ 1 ,t; m ) 
for all h G T^. Moreover, since X G X 4k it follows that E||X|| 2 ^ dA and from (A.33) in 
Lemma A. 3 by applying Markov's inequality that P(Qy 2 ) ^ Cr]m 2k /n k for some C > 0. 
Furthermore, by using the definition of to.*, i.e., l/v mt ^ nA/j mt , the condition to = to* 
implies a = nmax(l, 8d 3 A/7 m<i ) ^ 2|| p^ov]"* || and a/n ^ 8d 3 A. Therefore, from (A. 27) 
and (A. 28) in the proof of Proposition 4.1 follows 



E|4 - 4(/3)| 2 < c{|4(/3 - f3 m J\ 2 + |4(/3mJ| 2 r/ 



d 3 DTi 1 {a 2 + \\f5-f5 mt \\ 2 dK} 



m 2k 1 
— * — I 

n k nv n ^ 

l+fe/2 -i+k •? 

TO* , a . 2 TO 



n fe/2 n k/2-l n 



for some C > 0. Moreover, the definition of to* and <5* implies (l/m; mit )max(w m J,ii ms ) ^ 
A max(<5* , 1/n) and together with (A. 41) and (A. 42) in Lemma A. 4 \\P-(3 m * II 2 < 10cZ 4 p/j m , 
and |4(/?- /3 m J| 2 < 10 D d 4 pr 6* for all (3 G and /i G Consequently, we have 

E|4 - 4 (/3)| 2 < C m&x(5* n , 1/n) A 3 d 11 Drj P T{a 2 + pd 5 A/ 7m . + 1} 

?k — l+fc/2 ^ 

mz n to* m: 
1 H 1 1 1 

max(<5*,l/n) n k / 2 n k / 2 ~ 1 n 

Thereby, the result follows from the condition (4.2) which ensures that the factor in brackets 
is bounded as n — > oo, which completes the proof. □ 

The finitely and infinitely smoothing case. 

PROOF of Proposition 4.4. Observe that both cases the condition (4.2) is satisfied, 
where in part (i) it follows from the additional assumption p + a > 3/2. Since the condition 
on to and a ensures again in both cases that m ~ to* and a ~ n the result follows also 
from Theorem 3.2. □ 



Technical assertions. 

The following two lemmata gather technical results used in the proof of Proposition 4.1 and 
Theorem 4.3. 
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Lemma A. 3. Suppose X G X^ k and e G £^ k , k G N. T/ien /or some constant C > on/y 
depending on k we have 

I m 2fc xfc 

sup E V^dTeo.]- 1 / 2 ^), ^Cn- k (\\P-p m \\ 2 E\\X\\ 2 + a 2 ) V , (A.31) 

zeS™ v J 

k 

n [T C ovL 1/2 T n , m || 2fc ^ C ^ {||/3 - /3 m || 2fc (E||X|| 2 ) fc + <r 2fe } 77, (A.32) 



Ik 

E\\E n , m \\ 2k ^C- V ~, (A.33) 



,2k 



E||{[T cov ]„ - [f^U^L 1 / 2 !!^ ^ C ■ r, ■ ^ • (E\\X\\ 2 ) k (A.34) 



PROOF. Let z G S m and denote Cij := </3 - = (0 - /3 m , X^T^^yX^j. 

Then by the definition of T njTn given in (A. 24) we have 

m ^ n m 

zj ■ ([T cov ]^ /2 T ntm )j = - Zj{&j + <rei[Xi]j] =■ Si + S 2 , 

j=i i=i j=i 

where we bound below each summand separately, that is 

E|5i| 2fc < C ■ n~ k ■ ||/3 - p m \\ 2k ■ (E||X|| 2 ) fc • n, (A.35) 
E\S 2 \ 2k ^ C ■ n~ k ■ a 2k ■ n (A.36) 

for some C > only depending on k. Consequently, the inequality (A.31) follows from 
(A.35) and (A.36). Consider Si. Since E(/3 - 0m, Xi)[Xi]n± = [T cov (/3 - /3 m )]„ = [g] m - 
[T C ov}m[Pm]m = for all 1 ^ i ^ n, it follows that the random variables ZjCij), 
i = 1, . . . ,n, are i.i.d. with mean zero. From Theorem 2.10 in Petrov [1995] we conclude 
E|Si| 2fc Cn~ k E\ z jCi,j\ 2k f° r some constant C > only depending on k. Then we 
claim that (A.35) follows from the Cauchy-Schwarz inequality together with X\ G X^ k , i.e., 
E| Ef=i Zj[Xi]j\ 4k < »y and su PjeN E| [X!]^^]) 7 /!^ < ? ? . Indeed, we have 

2k 

E\(P-p m ,Xi)\ 4k < - /3 m || 4fe ^[^oo.]^^ • • • ^[^co V ] i2fej2fc lEn l^iliz/^o.]^! 2 

ji hk '=1 

||/3-/3J| 4fc (E||X|| 2 ) 2 S. 

and hence 

m m 

nY. z ^A 2k < (E|(/3-/3 m ,X 1 )| 4fc ) 1 /2( E |^ Zi [l 1 ] J |^)V2 

3=1 3=1 

<:||/3-/3 m || 2fc (E||X|| 2 )S. 

Consider S 2 - Since {osi Zj[Xj]j) are i.i.d. with mean zero, (A.36) follows in analogy 
to the proof of (A.35) by using E|e a £V Zj[Xi]j\ 2k < ?? for all Zi G X^ k and e x G £ 4fc . 
To deduce (A.32) from (A.31) we use that 

m 2k | m 2k 

E||[T cov ]- 1/2 T„ im || 2fc < m k ~ l ([T^]" 1 ^ ^ < m fc sup E V z^r^]- 1 ^ ) 

i=i ' ~ 2£§m 'i=i 
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Proof of (A.33). From the identity (E n)Tn )j t i = (1/n) Y%=i{[Xi]j[Xi]i - with Sji = 1 
if j = I and zero otherwise, we conclude E(H niTn )^ ^ C'n~ k E\[Xi]j[Xi]i - 5ji\ 2k . Thus 
X G X 4k implies E||H nim || 2fc < m 2 ^ 1 ) Y,j,l^n >m )fi < Cm 2k n - k r,. 

The estimate (A. 34) follows by using {[ 

T C ov]rn Pcov] m } Pcov] m — [^covlm ^"n,m from 

(A.33), which completes the proof. □ 

The next Lemma is partially shown in Cardot and Johannes [2008]. 
Lemma A. 4. Suppose the sequences 7, uj and v satisfy Assumption 2.1. Let T G My. Then 

sup{i; m ||[T] m 1 / 2 || 2 W{2 ( i 2 (2 ( i 4 + 3)} 1 / 2 ^ 4d 3 , (A.37) 

SU P II [ T ]m 1/2 [DiagM]™ 2 II 2 < {2d 2 (2d 4 + 3)} 1 / 2 < 4d 3 , (A.38) 

supirjV^Diag^)]- 1 / 2 !! 2 ^ d. (A.39) 

meN 

// in addition (3 m denotes a Galerkin solution of g = Tf3 then 

sup{ sup ||n m /3-/? m || 2 ) ^ 2(1 + d 2 ), (A.40) 

7716^11/311=1 J 

and in case (3 E is additionally satisfied then 

sup{ 7m ||/3 - An|| 2 } < 2(2d 4 + 3)p^ Wd 4 p. (A.41) 

7716N 

Suppose additionally that h G i/ien we /iaue 

sup{ 7m [m a x(u-\v 2 m )]- l \(h,(3- [3 m )\ 2 } ^2D(3 + 2d 4 )pr ^WDd 4 pr. (A.42) 

7716N 

Proof. The estimates (A.37) - (A.39) are given in Lemma A. 3 in Cardot and Johannes 
[2008]. Furthermore, from (A. 19) and (A. 20) in Lemma A. 3 in Cardot and Johannes [2008] 
follows (A.40) and (A.41). We start our proof of (A.42) with the observation that the link 
condition T G implies that T is strictly positive and that for all |s| ^ 1 by using the 
inequality of Heinz [1951] 

^ 2W II/H 2 2 ^l|r7l| 2 ^d 2|s| ||/I| 2 2s - (A.43) 

Thus, by using successivly the first inequality of (A.43), the Galerkin condition (A. 23) and 
the second inequality of (A.43), we obtain 

11/3 - PmC ^ d 2 \\T((3 - I3 m )\\ 2 ^ d 2 \\T(P - U m (3)\\ 2 < A\P ~ n m Pf v 2 (A.44) 

Since G and (■j~ 1 v 2 )j^ is monotonically decreasing, (A.44) implies ||/3 — (3 m \\ 2 2 ^ 
d 4 \\p - n m /3|| 2 2 ^ d^vlWPW 2 and hence, 

||n m /3 - /3 m || 2 2 < 2(11/3 - /? m || 2 2 + ||/3 - n m /3|| 2 2 } ^ 2(1 + d^vlWPW 2 . (A.45) 

Finally, by applying the Cauchy-Schwarz inequality we have 

\(h,(3-U m (3)\ 2 Wt,^ llCll/3||? (A.46) 
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and by using (A. 45) it follows 

\(h,U m (3 - I3 m )\ 2 < \\h\\l || [Diag(c^)]^ 1 / 2 [Diag(^)]^ 1 [| 2 1| (n m /3 - /3 m )||^ 2 

^2{l + d*) 1 -W m { sup (A.47) 

The estimate (A. 42) follows now from (A. 46) and (A.47) since under Assumption 2.1 there 
exists a constant D such that Vm{ su Pi^j^m V( a; i t '|)} ^ ^ max ( a; m 1 ! ^m) f° r an m € N, 
which completes the proof. □ 
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